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Abstract 

Background: Tlie melon belongs to the Cucurbitoceoe family, whose economic importance among vegetable 
crops is second only to Solonaceoe. The melon has a small genome size (454 Mb), which makes it suitable for 
molecular and genetic studies. Despite similar nuclear and chloroplast genome sizes, cucurbits show great variation 
when their mitochondrial genomes are compared. The melon possesses the largest plant mitochondrial genome, 
as much as eight times larger than that of other cucurbits. 

Results: The nucleotide sequences of the melon chloroplast and mitochondrial genomes were determined. The 
chloroplast genome (156,017 bp) included 132 genes, with 98 single-copy genes dispersed between the small 
(SSC) and large (LSC) single-copy regions and 17 duplicated genes in the inverted repeat regions (IRa and IRb). A 
comparison of the cucumber and melon chloroplast genomes showed differences in only approximately 5% of 
nucleotides, mainly due to short indels and SNPs. Additionally, 2.74 Mb of mitochondrial sequence, accounting for 
95% of the estimated mitochondrial genome size, were assembled into five scaffolds and four additional 
unscaffolded contigs. An 84% of the mitochondrial genome is contained in a single scaffold. The gene-coding 
region accounted for 1.7% (45,926 bp) of the total sequence, including 51 protein-coding genes, 4 conserved ORFs, 
3 rRNA genes and 24 tRNA genes. Despite the differences observed in the mitochondrial genome sizes of cucurbit 
species, Citrullus lonatus (379 kb), Cucurbita pepo (983 kb) and Cucumis melo (2,740 kb) share 120 kb of sequence, 
including the predicted protein-coding regions. Nevertheless, melon contained a high number of repetitive 
sequences and a high content of DNA of nuclear origin, which represented 42% and 47% of the total sequence, 
respectively. 

Conclusions: Whereas the size and gene organisation of chloroplast genomes are similar among the cucurbit 
species, mitochondrial genomes show a wide variety of sizes, with a non-conserved structure both in gene 
number and organisation, as well as in the features of the noncoding DNA. The transfer of nuclear DNA to the 
melon mitochondrial genome and the high proportion of repetitive DNA appear to explain the size of the largest 
mitochondrial genome reported so far. 
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Background 

The melon {Cucumis melo L.) is an important vegetable 
crop grown in temperate, subtropical and tropical 
regions worldwide. The melon belongs to the Cucurbi- 
taceae family, which also comprises other vegetable 
crops such as cucumber, watermelon, pumpkin and 
squash, and whose economic importance among vegeta- 
ble crops is second only to Solanaceae, C, melo is a 
diploid species (2x = 2n = 24) with an estimated haploid 
genome size of 454 Mb [1]. In recent years, extensive 
research has been performed in melon to elucidate fruit 
ripening processes, carotene accumulation and aroma 
production [2]. In addition, genomic approaches to 
melon breeding have been successfully applied to the 
molecular characterisation of important agronomic 
traits, such as pathogen resistance [3,4] and sex determi- 
nation [5,6]. Recent research has increased the availabil- 
ity of genetic and genomic resources for melon [7], such 
as the sequencing of ESTs [8,9], the development of an 
oligonucleotide-based microarray [10], the construction 
of BAG libraries [11-13], the production of mutant col- 
lections for TILLING analyses [14-16], the development 
of a collection of near-isogenic lines (NILs) [17], the 
construction of several genetic maps [9,18-22] and the 
development of a genetically anchored BAG-based phy- 
sical map [23]. 

The MELONOMIGS project, aimed at sequencing the 
complete melon genome using a whole-genome shotgun 
strategy, was recently initiated by a Spanish consortium 
[24]. Determination of the complete melon genome also 
includes sequencing of the chloroplast (cpDNA) and 
mitochondrial (mtDNA) genomes. As of 6 June 2011, 
the NGBI databases contain 220 eukaryota plastid gen- 
ome records [25]. Gomparative studies have indicated 
that the chloroplast genomes of land plants are highly 
conserved in both gene order and gene content and are 
moderately sized, between 130 and 150 kb [26]. In con- 
trast, plant mitochondrial genomes range from 200 to 
2,400 kb in size, which is at least 10 to 100 times the 
size of typical animal mitochondrial genomes [27,28]. 
Cucurbitaceae possess the largest known plant mito- 
chondrial genomes; however, species that belong to the 
same genera within Cucurbitaceae and have similar 
nuclear genome sizes show great size differences in their 
mitochondrial genomes [27]. Experimental procedures 
based on kinetic reassociation rate measurements have 
predicted a melon mitochondrial genome of 2,400 kb, 
the largest one among plants and animals and compar- 
able in size to the genomes of many free-living bacteria 
[27,29]. Recently, the mitochondrial genomes of Citrul- 
lus lanatus (watermelon) (379 kb) and Cucurbita pepo 
(squash) (983 kb) have been determined [30]. Sequence 
analysis of these mitochondrial DNAs has suggested 



that the increased genome size in this family reflects an 
accumulation of chloroplast-derived and short repeated 
sequences, whereas protein-coding regions are con- 
served across these species, with minor exceptions 
[30,31]. In general terms, DNA transfer from organellar 
genomes to nuclear DNA, and vice versa, appears to be 
a common phenomenon associated with the redistribu- 
tion of genetic material between nuclear and organellar 
genomes [32-35]. Furthermore, a reduction in organelle 
DNA content is linked to a gradual loss of the genetic 
autonomy of organelles [34,36,37]. 

Next-generation sequencing platforms are rapidly 
changing the field of genomics, allowing both re-sequen- 
cing and de novo sequencing of whole genomes with a 
significant reduction in cost and time relative to conven- 
tional approaches. Nevertheless, only a few examples of 
plastid genome next-generation sequencing have been 
published so far and no plant mitochondrial genome 
has been sequenced that way [38-44]. In this article, we 
report the complete sequence of the melon chloroplast 
genome obtained from BAG end sequences (BES), and 
we report an estimated 95% of the melon mitochondrial 
genome determined using Roche-454 sequencing tech- 
nology. With a size over 2.7 Mb, the mitochondrial gen- 
ome of melon represents the largest mitochondrial 
genome sequenced so far. Data on the structure and 
content of both organellar genomes and a comparison 
to published cucumber chloroplast and watermelon and 
squash mitochondrial genomes are presented. 

Results and discussion 

Organisation of the Cucumis melo chloroplast genome 

The complete nucleotide sequence of the chloroplast 
genome of melon (C. melo subsp. melo, PIT92) was 
determined (GenBank Acc. No. JF412791). The genome 
was 156,017 bp long and included a pair of inverted 
repeats (IRa and IRb) of 25,797 bp separated by small 
(SSG) and large (LSG) single-copy regions of 18,090 and 
86,334 bp, respectively (Figure 1, Table 1). The GG con- 
tent was found to be 36.9%, which is identical to that of 
cucumber, the only other reported cucurbit chloroplast 
genome [45-47], and to other sequenced plant chloro- 
plast genomes. 

The melon chloroplast genome contains 132 genes, 
including 98 single-copy genes and 17 duplicated in IR 
regions (Figure 1 and Table 2). The gene-coding regions 
accounted for 59.7% of the genome and included 75 
protein-coding genes and 6 conserved ORFs, 4 rRNA 
genes and 30 tRNA genes, which represented 51.6%, 
2.9% and 5.2% of the total sequence, respectively; cis- 
spliced introns accounted for 12.1% of the genome. The 
genes clpP, rpsl2 and ycfS contained two introns, while 
15 additional genes contained one intron each. The 
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rpsl2 gene was found to undergo ^raws-splicing, with 
the 5' exon located in the LSC region and the other two 
exons located in both IR regions. 

The border sequences between the IR, LSC and SSC 
regions vary among different species. The pattern in 
melon is similar to that in cucumber, as described in 



Table 1 C. melo chloroplast genome characteristics 



Total size [nt] 


156,017 


GC content 


36.9% 


Gene number 


132' 


Protein genes 


87 (81)^ 


rRNA genes 


8 (4)^ 


tRNA genes 


37 (30)^ 


Single-copy genes 


98 


Duplicated genes 


17 


Gene with introns 


18 


Trans-spliced genes 


1 


Coding sequences [nt] 


93,209 (59.7%) 


Protein coding [nt] 


80,580 (51.6%) 


tRNAs and rRNAs [nt] 


12,629 (8.1%) 


Non-coding sequences [nt] 


62,809 (40.3%) 


c/s-spliced introns [nt] 


18,822 (12.1%) 


Intergenic sequences [nt] 


43,987 (28.2%) 



^Duplicated genes counted as two 

"^In parentheses, duplicated genes counted as one 



[45]. In particular, IRa extended 1,199 bp into the ycfl 
gene, and the IRb/SSC border was within the coding 
regions of the ycfl -like and ndhF genes, which overlap 
by 32 bp. The IRa/LSC border was located downstream 
of the trnH'GUG gene, whereas the \\frpsl9 gene, pre- 
sent in that region in other species such as Arabidopsis 
thaliana, was absent in both melon and cucumber. 
Finally, the IRb/SSC border extended 2 bp into the 5' 
coding region of the rpsl9 gene, as in cucumber. 

The melon chloroplast genome was screened for simple 
sequence repeats (SSRs), which resulted in the identifica- 
tion of 69 microsatellites that were at least 10 nt in length 
(1 to 2 nt repeats) or contained at least four tandem repeat 
units (3 to 6 nt repeats). All the microsatellites found were 
shorter than 18 bp. SSRs accounted for 796 bp (0.5%) of 
the total sequence, which was similar to the SSR content 
estimated for the melon nuclear genome [48]. The poly 
(A)/poly(T) microsatellite was the only mononucleotide 
repeat found and represented 79.7% of all SSRs found. 

Comparison of the cucumber and melon chloroplast 
genomes 

As of today, the chloroplast genome of only one cucur- 
bit species, Cucumis sativus (cucumber), has been pub- 
lished [45-47]. Previous studies have suggested that 
sequence analysis of chloroplast genes can be a valuable 



Rodriguez-Moreno et al. BMC Genomics 201 1, 12:424 
http://www.biomedcentral.eom/1 471 -21 64/1 2/424 



Page 4 of 14 



Table 2 List of genes found in the Cucumis melo chloroplast genome 



RNA genes 
tRNAs 



rRNAs 
Photosynthesis genes 

Acetyl-coa carboxylase 
ATP-dependent protease 
ATP synthase 

Cytochrome b/f 

Cytochrome c biogenesis 
NADH dehydrogenase 



Photosystem I 
Photosystem II 



Rubisco 
Other genes 

Conserved ORFs 

Transl. initiation factor 
Intron maturase 
Membrane protein 
Ribosomal proteins 



RNA polymerase 



trnA-DGC' ^ 

trnf/W-CAU 

trn/-GAU'' ^ 

trn/W-CAU 

trnR-DCD 

rrn7-UGU 

rml6^ 



occD 

clpP^ 

atpA 

otpl 

petA 

petN 

ccsA 

ndhA' 

ndhF 

ndhK 

psoA 

psbA 

psbF 

psbL 

rbcL 

ycfl 
ORF70' 
infA 
matK 
cemA 
rplH 
rpl23^ 
rps3 
rpsl2^ 
rpsT9 
rpoA 



h, c, f 



trnC-GCA 
trnG-GCC 

trn/V-GUU^ 
trnS-GCU 
trnV-GAC^ 
rm23^ 



atpB 
petB"" 



ndhB'' 
ndhG 

psoB 
psbB 
psbH 
psbM 



ycf/-like^ 



rpll6^ 
rpl32 
rps4 
rpsl4 

rpoB 



rrnD-GUC 

trnG-UCC 

trnL-CAA^ 

trnP-DGG 

trnS-GGA 

trnV-UAC 

rm4.5^ 



atpE 
petD"" 



ndhC 
ndhH 

psoC 
psbC 
psbl 
psbN 



ycf2'' 



rpl2'' ^ 
rpl33 
rps7° 
rpsl5 

rpoCf 



trnE-UUC 
trnH-GUG 
trnL-UAA' 
trnQ-UUG 
trnS-UGA 
trnW-CCA 



QtpF^ 
petG 



ndhD 
ndhi 

psol 
psbD 
psbJ 
psbT 



ycf3' 



rpl20 
rpl36 
rps8 
rpsl6' 

rpoC2 



fmF-GAA 

trnl-CAU^ 

trnL-UAG 

trnR-ACG^ 

trnT-GGU 

trnY-GUA 



atpH 
petL 



ndhE 
ndhJ 

psoJ 
psbE 
psbK 
psbZ 



ycf4 



rpl22 
rps2 
rpsll 
rpsl8 



^Gene that contains one intron 
"^Two gene copies due to IR 
'^Gene that contains two introns 

'^ycfl spans an inverted repeat region (IRa) and the adjacent small single-copy region (SSC). ycf/-like is a truncated form of ycfl that occurs in the inverted repeat 
region IRb. 

^Encodes a putative protein similar to ycfl 5 from Lactuca sativa (ABD47292.1) and Helianthus annuus (ABD47205.1) 
^Gene that undergoes trans-splicing 



tool for phylogenetic studies among closely related spe- 
cies [49,50]. Accordingly, and due to the highly poly- 
morphic nature of the Cucurbitaceae family, a 
comparison of the melon and cucumber chloroplast 
genomes can provide useful information about the evo- 
lutionary relationships among cucurbit species. 

The chloroplast genome sequence of the C. sativus 
'Chipper' line (GenBank Acc. No. DQ865976.1) was 
compared to the melon genomic sequence reported 



here. The cucumber genome was 494 bp shorter than 
the melon genome, but overall only approximately 5% 
of the nucleotide sequences were different, mainly due 
to indels and SNPs (Table 3). Deletions in the melon 
sequence, compared to cucumber, were found at 237 
loci and represented 2,742 bp, or 1.76% of the cucumber 
genome. Eighty-five percent of the deletions involved 
the loss of less than 10 bp, while five deletions repre- 
sented the loss of 125 to 379 bp. Insertions in the 
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Table 3 Differences between the C. melo and C. sativus chloroplast genome sequences^ 

Deletions^ 





length (bp) 

1 


number 

9.9. 




1 

2 


oo 

21 




3-4 


26 




5-6 


54 




7-9 


14 




10-19 


14 




22-84 


11 




125-270 


2 




353-379 


3 


Total: 


2742 (1.76%^) 


237 


Insertions^ 








length (bp) 


number 




1 


76 




2-3 


1 7 




^ J 


1 9 




7-8 


11 




9 


11 




10-17 


8 




18 


6 




19-87 


12 




126 


2 




147 


2 




714 


2 


Total: 


3210 (2.06%^) 


188 



SNPs' 



C^A 
G^T 

T^G 



Total: 



} 507 
} 392 
2250 (1.44%) 



C^T 
G^A 

C^G 
G^C 



} 437 
} 254 



A^G 
T^C 

A^T 
T^A 



} 420 
} 240 



Other polymorphisms 



GA^TT 
GTGG^AATC 
CCAT^TTTA 
TTAT^AATC 

Highly polymorphic regions^ 



number 

1 
1 
1 
1 



length 



709 bp 



^The chloroplast genome sequence of Cucumis sativus 'Chipper' line (GenBank Acc. No. DQ865976.1) was used for the comparison 
Positions where the C melo sequence has a gap In comparison to cucumber 
^Relative to the cucumber genome length 

^Positions where the C sativus sequence has a gap In comparison to melon 
"^Relative to the melon genome length 
^Cucumber melon 

^Highly divergent regions found between the melons at coordinates 126,000 and 130,000 



melon sequence as compared to cucumber were found 
at 188 loci representing 3,210 bp. Seventy-one percent 
of these insertions involved the gain of less than 10 bp; 
six insertions of 126 to 714 bp were also found. Addi- 
tionally, we identified 2,250 SNPs, which represented 
1.44% of the melon sequence. 



Recombination mechanisms between direct repeat 
sequences on the SSC/IR border regions have been 
found to be responsible for the expansion/contraction of 
the IR sequences, which can create large sequence varia- 
tions in chloroplast genomes [45,51]. Significantly, the 
area of highest diversity between the compared genomes 
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was found in the region located between the melon 
sequence coordinates 126,000 and 130,000, close to the 
SSC/IRa border. In particular, eight highly polymorphic 
regions with a total length of 709 bp were found in this 
region (Table 3). 

An additional comparison between the amino acid 
sequences of the melon and the cucumber chloroplast- 
encoded proteins was performed, and the results are 
shown in Additional file 1 Table SI. Except for ORF70 
and the ycfl -like gene the annotation of both species 
contained the same set of ORFs. Nevertheless, the pub- 
lished cucumber sequence contains ORFs homologous 
to those of melon ORF70 and ycfl. 

When the predicted protein sequences were BLASTed 
against the non-redundant GenBank database, cucumber 
was identified as the highest-scoring plant species for 72 
of the 81 predicted coding genes (duplicated genes were 
counted as one gene). With the exception of the rpl22 
and accD genes, which had identity values of 91% and 
82%, respectively, the rest of the 72 genes showed iden- 
tity values higher than 95% when compared to their 
cucumber homologues. 

Five out of nine genes whose highest-scoring match 
was not cucumber showed protein identities higher than 
96%, although the identity values, when compared to 
their cucumber homologues, were also high (Additional 
file 1 Table SI). Finally, the predicted proteins with 
lower identity to other plant chloroplast proteins were 
those encoded by the clpP, ycf2 and, particularly, both 
the ycfl and ycfl-VikQ genes. 

Organisation of the Cucumis melo mitochondrial genome 

After the isolation of intact mitochondrial organelles 
from young melon leaves, mtDNA was extracted and 
sequenced using the Roche-454 technology, and the 
104,462 resulting reads were assembled as described in 
the Methods section. BES from two different BAG 
libraries [13] and whole genome sequences derived from 
454 sequencing of 3-kb, 8-kb and 20-kb paired-end (PE) 
libraries (unpublished) were also used to improve the 
genome assembly. 

The resulting sequence amounts to 2.74 Mb distributed 
in five scaffolds of lengths 2,428,112 bp, 147,837 bp, 
107,070 bp, 47,488 bp and 6,086 bp and four additional 
unscaffolded contigs that totalled 1,809 bp. (Table 4). 
The overall sequence coverage is 18-fold. The size of the 
melon mitochondrial genome has previously been esti- 
mated to be approximately 2.4 to 2.9 Mb [27,30]. Based 
on this estimate, we can assume that 95% of the mito- 
chondrial genome has been assembled and that 84% of 
the genome is contained in a single scaffold. Failure to 
assemble all the reads in a single circular sequence can 
be attributed to the high degree of repetitive sequences 
found in this genome, as will be discussed later. However, 



Table 4 C. melo mitochondrial genome characteristics 



Total scaffold/contig size [nt] 2,738,402 

GC content 44 5% 

Gene number^ 78 

Protein genes^ 51 

rRNA genes^ 3 

tRNA genes^ 24 

Genes with introns 10 

Trans-spliced genes 3 

Coding sequence 1 68% 

Protein coding 1 .37% 

tRNAs and rRNAs 0.31% 

Non-coding sequence 98.32% 

c/s-spliced introns 1 80% 

Intergenic sequences 96 53% 
Repetitive content 

SSRs 0.15% 

Transposable-related sequences 0.24% 

Any perfect repeats 42.70% 

Tandem repeats 1 51% 

Inverted repeats 1.85% 

Mitochondrial-like^ 44% 

Chloroplast-like^ 1 41% 

Nuclear-like'* 46.47% 



^Duplicated and triplicated genes (see Table 5) were counted once 
'^Homologous regions between C. melo, C. lanatus and C. pepo mitochondrial 
genomes 

^Homologous regions between C. melo mitochondrial and chloroplast 
genomes 

^Homologous regions between C melo nuclear and mitochondrial genomes 

the existence of several subgenomic molecules that coex- 
ist inside the mitochondria, as has been described in 
other species [52-54], cannot be ruled out. The contig 
and scaffold sequences have been deposited in GenBank 
under Accession Numbers JF412792 to JF412800. 

The GC content of the mitochondrial genome was 
found to be 44.5%, which is higher than that of the 
chloroplast and nuclear melon genomes and similar to 
the estimated GC content of the watermelon and squash 
mitochondrial genomes [27]. Annotation of the 
sequence was performed, and 67 genes were detected 
(Tables 4 and 5). Gene-coding regions accounted for 
1.7% of the genome (45,926 bp) and included 36 pro- 
tein-coding genes and 4 conserved ORFs, 3 rRNA genes 
and 24 tRNA genes, which represented 1.3%, 0.1% and 
0.3% of the total sequence, respectively; c/5-spliced 
introns accounted for 1.8% of the genome. The genes 
nad2, coxl, ccmFc, rpl2, rpsS and rpslO contained one 
intron, nad4 contained three introns, and nadl, nadS 
and nad7 contained four introns each. The nadl, nad2 
and nadS genes were found to undergo trans-splicing. 

As of 6 June 2011, the mitochondrial genome 
sequences of 32 Streptophyta have been deposited in 
GenBank [25], including two cucurbit species: C. 
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Table 5 List of genes found in Cucumis melo mitochondrial genome 



RNA genes 



tRNAs 


trnD-GJC 


trnE-TTC^ 


trnF-GAA 


trnfM-CAJ^ 


trnG-GCC^ 




trnH-GJG^ 


rrnH-GTG-cp' 


trnl-CAJ^' ^ 


trnL-CAA^ 


trnM-CAJ 




trn/W-CAT-cp^ 


trnN-GU 


trn/V-G^-cp^ 


trnP-JGG 


trnO-TTG 




trnR-ACG 


trn/?-ACG-cp^ 


trnS-GCJ^ 


trnS-JGA 


trnS-TGA-cp^ 




trnW-CCA'' 


trnY-GJA 




^'trnC 




rRNAs 


rm26 


rrn 18 


rrnS^ 






Complex 1 

(NADH dehydrogenase) 


nadl"^- ^' ' 


nad2'' ^ 


nods 


nod4^ 


nod4L'^ 






node 


nod/" 


nod9 




Complex II 

(succinate dehydrogenase) 


sdh3 


sdhA 








Complex III 

(ubiquinol cytochrome c reductase) 


cob 










v_urnpicx IV 

(cytochrome c oxidase) 


cox 1 


cox2 


cox3 






ATP synthase 


otpl 


otp4 


otp6 


otpS 


otp9 


Other genes 












Cytochrome C biogenesis 


ccmB 


ccmC 


ccmFd 


ccmFn 




Transport membrane 


mttB 










Maturase 


matR^ 










Ribosomal proteins 


rpl2^ 


rpl5 


rplie"^ 


rpsl 


rps3^ 




rps4 


rps7 


rpslO^''^ 


rpsl2 


rpsl 3 


Conserved ORFs 


ORFT 


0RF2° 


0RF3P 


0RF4^ 





Pseudogenes are symbolised by \|/ 
^Three gene copies 
'^Two gene copies 
''Chloroplast origin 

'^C assumed to be post-transcriptionally modified to lysidine, which pairs with A, not G (see PubMed ID 1698276) 

^Seven gene copies 

^Undetermined anticodon 

^RNA editing creates a codon 

^Gene contains five exons 

'Gene undergoes trans-splicing 

^Gene contains one intron 

""Gene contains four exons 

'start codon not determined 

""Alternative start codon (see PubMed IDs 8193306 and 9327595) 
"Similar to chloroplast ycf2 gene 

"Similar to ORF150 in V. vinifera, ORF159b in Nicotiana, ORF168 in Marchantia and ORF187 in Physcomitrella 
•^Similar to amino acid sequence GenBank ID CAA69750.1 
'^Similar to 5' fragment of photosystem I P700 apoprotein Al 



lanatus (NC_014043) and C pepo (NC_014050). Genes 
homologous to all the predicted protein-coding genes 
from the watermelon and squash mitochondrial gen- 
omes have been found in the annotated melon 
sequence, with the exception of the rpsl9 gene. How- 
ever, it is already known that this gene has been lost 
from the mitochondrial genome in diverse species due 
to transfer to the nucleus; in particular, cucumber, 
which is phylogenetically closer to melon than both 
watermelon and squash, has apparently recently lost this 
gene [55]. Apart from the loss of the rpsl9 gene, some 
differences were found regarding the number of tRNA 
genes in the three cucurbit genomes. For example, while 



two trnQ genes, two trnC genes and one trnK gene 
were found in watermelon, only one trnQ gene and no 
trnC or trnK genes were present in melon. However, it 
is well known that even phylogenetically related species 
differ substantially in their tRNA complement set (for 
example, see [30] for cucurbits). 

When the predicted protein sequences were BLASTed 
against the non-redundant GenBank database, cucurbits 
were identified as the highest-scoring plant species in 
only 16 of all 40 predicted coding genes (Additional file 
2 Table S2). This is in sharp contrast to the chloroplast 
sequences discussed above, in which the majority of 
melon proteins displayed the highest identity values 
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when compared to their cucumber homologues. The 
identity values of the 16 proteins ranged from 78% (rps3 
protein) to 99% (cob protein). However, RNA editing 
events, which are known to frequently alter mitochon- 
drial transcripts, have not been identified in melon 
except for a limited number of cases. Therefore, the 
actual identity values are expected to be somewhat 
higher than our estimated values. Twenty out of 24 
genes whose highest-scoring match was not a cucurbit 
species showed protein identities higher than 90% for 
the corresponding best hits. Finally, the predicted pro- 
teins with lower identity to other plant mitochondrial 
proteins are those encoded by sdh3, ccmFn, rpsl, rps4 
and, particularly, ORF2, ORF3 and rpsS, 

For gene distribution along the mitochondrial chro- 
mosome, several small syntenic clusters are found when 
the melon, watermelon and squash mitochondrial 
sequences are compared (Additional file 3 Figure SI). 
However, as has been described for watermelon and 
squash, the distribution of these clusters reveals a high 
level of genomic shuffling and rearrangement between 
these three species [30]. 

Analysis of repetitive DNA, chloroplast and nuclear- 
derived DNA 

Although the gene content of melon is highly similar to 
that of watermelon or squash, the melon mitochondrial 
genome size is thrice that of squash and as much as 
seven times that of watermelon. In fact, regions of DNA 
as large as 600 kb could be found that contained no 
protein-coding genes. Figure 2 shows a schematic repre- 
sentation of the gene density of the largest scaffold (2.43 
Mb). 

To establish the fraction of this huge genome that is 
shared with the other two cucurbit mitochondrial gen- 
omes, all three sequences were cross-compared using 
BLASTn. It has been previously reported that processes 
such as nuclear or chloroplast DNA transfer to the 
mitochondria and internal recombination of the mito- 
chondrial genome lead to a high degree of sequence 
rearrangement that can obscure any trace of homology 
over time [30]. For this reason, a less conservative e- 
value of lE-3 was chosen for the comparative analysis. 
As a result, 173 kb (46%) and 163 kb (16.6%) of the 
watermelon and squash mitochondrial genomes, respec- 
tively, were found to be homologous with the melon 
sequence. In addition, 73% of these homologous regions 
(119 kb from watermelon and 125 kb from squash) 
were shared among all three species. Seventy-nine 
regions longer than 500 bp accounted for 60% of the 
total homologous sequence (1,000 homology regions 
averaging 180 bp in length). These figures are in accor- 
dance with the generally accepted theory of watermelon 
being phylogenetically closer to melon than to squash. 



The conserved mitochondrial-like sequence was found 
to contain all the predicted ORFs except for ORFl and 
ORF4 (which are present in conserved regions in melon 
and watermelon, but not squash), and so it can be con- 
cluded that the approximately 120 kb of conserved 
sequence (32%, 12% and 4.4% of the watermelon, squash 
and melon mitochondrial genomes, respectively) repre- 
sented a core cucurbit mitochondrial genome present in 
all three sequenced genomes. Also, the finding that 
approximately 27% of the conserved melon and water- 
melon regions were not conserved in squash, and vice 
versa, points to independent events that have directed 
the evolution of these three genomes from a common 
cucurbit ancestor. In any case, the previous data showed 
that 95% of the melon mitochondrial genome had no 
homology whatsoever with the mitochondrial sequences 
of other cucurbits. 

Previous reports have indicated that small, repetitive 
DNAs contribute significantly to the expanded mito- 
chondrial genome of cucumber, which is estimated to 
be 1.8 Mb [31]. Therefore, the presence of SSRs, trans- 
posable elements, inverted repeat regions and tandem 
and direct repeats was analysed. The mitochondrial 
sequence contains 357 SSRs (one SSR every 7.7 kb) that 
amounts to 4,071 bp (0.1% of the total sequence). All 
the microsatellites were shorter than 21 bp, except for a 
(GACT)7. This value is ten times lower than the esti- 
mated SSR content of the melon nuclear genome [48]. 
In comparison, the squash and watermelon mitochon- 
drial genomes contain one SSR every 4.6 kb and 5.6 kb, 
respectively (0.3% and 0.2% of the total sequence). 
Therefore, microsatellites represent an insignificant por- 
tion of the melon mitochondrial genome and cannot 
explain its large size. The presence of transposon-related 
sequences was also investigated, but only small frag- 
ments that totalled 6,480 bp (0.23% of the total 
sequence) were found to show homology to transposable 
elements (mainly LTR retrotransposons). The search for 
inverted repeat sequences (IRs) produced 427 pairs of 
IRs, which amounted to 50,601 bp (1.8% of the available 
mitochondrial sequence). Percent matches between IRs 
were higher than 70%, with 137 pairs of IRs showing 
values higher than 95%. The average repeat length was 
82 bp; the longest IR found was 1,067 bp. In compari- 
son, the IR contents of watermelon and squash were 
also calculated, but only 14 IRs (1,497 bp) and 17 IRs 
(2,096 bp) were found in those species, which is 
between four and nine times lower than the melon IR 
content. Therefore, the melon mitochondrial genome 
was significantly enriched in sequences that can mediate 
recombination events. 

Regarding the tandem repeat content of the sequenced 
genome, the analyzed sequence contained 449 tandem 
repeats, which amounted to 41,212 bp or 1.5% of the 
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Figure 2 Gene density representation of 2.43 Mb of the melon mitochondrial genome. The displayed region corresponds to tine largest 
scaffold obtained, which represents 84% of the estimated melon mitochondrial genome. The symbol ^ connects exons of the same gene, while 
horizontal lines connect exons of trans-spliced genes. The nadS gene contains five exons, of which only four are present in the depicted 
scaffold. 



available sequence. The average period size and period 
copy number were 39 and 3, respectively. The most 
abundant type of tandem repeats were those with period 
sizes of 29, 35 and 70, which totalled 40% of all tandem 
repeats found and 56% of the tandem distributed 
sequence. As a comparison, the tandem repeat contents 
of watermelon and squash were also calculated, and 10 
repeats (1,036 bp or 0.3% of the genome) and 236 
repeats (19,060 bp or 1.9% of the genome), respectively, 
were found. Therefore, while the relative tandem repeat 
contents of melon and squash were similar, watermelon 
showed a significantly reduced tandem repeat content in 
its mitochondrial genome. 

Additionally, the maximal repeat content and repeat 
families of the mitochondrial genomes of watermelon, 
squash and melon were calculated using two different pro- 
grams (see Methods section). The RepeatScout program, 
which detects repeats larger than 50 bp and excludes low 
complexity sequences, predicted 101 repeat families (with 
an average copy number per family of 35) in melon, 13 
families (with an average copy number per family of 34) in 
squash and 2 families (with an average copy number per 
family of 3) in watermelon. The most abundant repeat 



families for the compared mitochondria consisted of 365 
copies of approximately 120-bp-long repeats for melon 
and 90 copies of approximately 173-bp-long repeats for 
squash. Only 3 repeats were found to be longer than the 
average read length, which is 399 nt. Incidentally, the fact 
that the most abundant repetitions are shorter than the 
average 454 read length implies that, in those cases, the 
454 reads extends the repetitions and result in the correct 
assembly of reads. Therefore, although the existence of 
mis-assemblies of repetitive sequences cannot be comple- 
tely rule out, mis-assemblies probably affect our proposed 
sequence to a much lower degree that could be guessed 
based only on the high repeat content of the genome. 

We also searched for exact repeats longer than 20 bp 
using REPuter (results summarised in Table 6). Similar 
to the findings reported for squash [30], we found a sig- 
nificant content of short repeats in the mitochondrion 
of melon. Our numbers for squash and watermelon 
were slightly lower compared to data obtained in [30] 
because we looked only for exact repeats, but the differ- 
ences among the genomes analysed are clear. The mito- 
chondrion of melon is much richer in large repeats than 
that of squash. 
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Table 6 Repeat content in the mitochondria of Cucumis 
melo, Cucurbita Pepo and Citrullus lanatus 

Repeat coverage (%) 



Repeat length (# nt) C. melo C. pepo C. lanatus 



20-29 


17.16 


15.33 


1.65 


30-39 


7.12 


4.30 


0.57 


40-49 


3.75 


1.60 


0.35 


> 50 


14.67 


4.15 


5.76 


All 


42.70 


25.39 


8.33 



Chloroplast-derived DNA accounts for as much as 9% 
of sequenced plant mitochondrial genomes [56]. The 
melon chloroplast genome described above was used to 
identify mitochondrial sequences of putative chloroplast 
origin. In all, 35 mitochondrial regions that ranged from 
61 to 10,578 bp (average 1.1 kb) and totalled 38.6 kb or 
1.4% of the mitochondrial genome, showed homology 
with the melon chloroplast sequence. On the other 
hand, 54 kb or 35% of the melon chloroplast genome 
showed homology to the mitochondrial genome. The 
38.6 kb difference in the chloroplast-derived mitochon- 
drial sequence was due to duplicated regions in the 
chloroplast genome. As a comparison, watermelon's 
mitochondrion contains 23 kb of chloroplast-like 
sequences, while squash's mitochondrion contains 113 
kb, which represents approximately 80% of other 
sequenced cucurbit chloroplast genomes such as those 
of melon and cucumber. Therefore, no correlation 
seems to exist between the mitochondrial sizes of these 
three species and their chloroplast-derived sequence 
content. 

Finally, nuclear-derived sequences have been detected 
in several plant mitochondrial genomes and amount to 
up to 7% of their size [30,57]. In watermelon and 
squash, approximately 20 kb of nuclear-like sequences, 
most of which resemble retrotransposable elements, 
have been found. Although the contribution of retro- 
transposons to the expanded melon mitochondrial gen- 
ome is negligible, as discussed above, the BLASTing of 
361 Mb of the melon nuclear genome draft sequence 
obtained in our laboratories (unpublished data) against 
the mitochondrial sequence produced 1,114 mitochon- 
drial regions that ranged from 193 bp to 10,355 bp and 
that totalled 1,272,615 bp (46.5% of the available mito- 
chondrial sequence). Significantly, even when only the 
413 homologous fragments longer than 1 kb were con- 
sidered, more than 33% of the available mitochondrial 
sequence still showed homology with melon nuclear 
regions. The analysis of those 37 mitochondrial homolo- 
gous regions longer than 4 kb and totalling ca. 200 kb 
showed that the average identity between the mitochon- 
drial and nuclear regions was 91% with values ranging 
from 84 to 96%. The detailed analysis of two of these 



regions with lengths 4,220 and 4,044 bp and identities 
of 94% and 89% relative to their nuclear counterparts, 
showed a transition/transversion mutation ratio of 2.2 
and 3.8 respectively, with Cmii ^ ^Nuc and Gmu ^ 
Anuc the most abundant mutations found, and Cmh ^ 
Anuc the most representative transversion mutations. 
Twenty-seven indels totalling 70 nt and five gaps of 
between 11 and 60 nt were also found. 

Interestingly, all 37 regions analyzed but three dis- 
played high levels of sequence identity with at least two 
different nuclear regions, therefore suggesting a relation- 
ship between the repetitive nuclear DNA and the mito- 
chondrial DNA of putative nuclear origin. 

A large fraction of the mitochondrial gene-containing 
regions and some chloroplast-like regions in the mito- 
chondria showed homology with the nuclear sequence, 
as was expected because many mitochondrial genes 
have homologous counterparts in the nuclear genome. 
Also, DNA transfer from the chloroplast to the mito- 
chondrion has been known to occur. When these 
regions were not considered, 1.14 Mb of mitochondrial 
sequence still showed homology to nuclear sequences. 
In all, nearly half of the melon mitochondrial genome 
seemed to be of nuclear origin; therefore, the transfer of 
DNA from the nucleus can, at least partially, explain the 
large size of this mitochondrial genome. However, the 
nature of approximately 1.5 Mb of mitochondrial 
sequence remains to be elucidated. 

Conclusions 

Whereas the size and gene organisation of the chloro- 
plast genome were similar among cucurbit species, the 
mitochondrial genomes showed a wide variety of sizes, 
with a non-conserved structure both in gene number 
and organisation, as well as in the features of the non- 
coding DNA; nevertheless, we identified a minimum 
cucurbit genome core of 119 kb between melon, water- 
melon and squash with a high level of nucleotide 
sequence conservation. In addition to a high propor- 
tion of repetitive DNA content in melon, compared to 
watermelon and squash, the transfer of nuclear DNA 
to the melon mitochondrial genome seems to explain 
the size of the largest mitochondrial genome reported 
so far. 

Methods 

Source of the chloroplast genome sequences 

A melon random-shear BAG library had been previously 
constructed and the BES from 16,128 clones determined 
[13]. The average sequence length was 534 bp. The BES 
were then filtered using the cucumber chloroplast gen- 
ome sequence (GenBank Acc. No. DQ865976.1) as a 
reference, and 5,785 BES totalling 3.2 Mb were found to 
show homology with the cucumber sequence. 
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Chloroplast genome assembly, annotation and analysis 

The selected BES were assembled using the Sequencher 
4.1.1 software package with a minimum overlap of 15 
and a minimum match of 85%. Due to the presence of 
an inverted repeat in the chloroplast genome of plant 
species, a final step of manual assembly was required to 
obtain a final contig of 5,683 sequences that represented 
the melon chloroplast genome. 

The consensus sequence was then annotated using the 
DOGMA online organellar annotation tool [58]. The 
predicted ORFs, including cis- and trans -splicing sites, 
were manually checked by comparison with all other 
published chloroplast genes, and several changes were 
then introduced into the DOGMA preliminary annota- 
tion to produce the final annotated sequence. A graphi- 
cal representation of the annotated genome was 
produced using the CG Viewer Server [59]. 

The melon and cucumber chloroplast genome 
sequences were aligned using MEGA4 software to detect 
polymorphisms between these species. The predicted 
chloroplast-encoded proteins were analysed for homol- 
ogy with other known proteins using the GenBank non- 
redundant protein database and the BLASTP software. 
Microsatellites were searched using msatcommander 
0.8.2 software [60]. SSRs considered for the final dataset 
included 1- to 2-nt repeats of at least 10 nt in length 
and 3- to 6-nt repeats with at least four unit repetitions. 

Plant material 

Melon seeds from the double haploid line PIT92 
(derived from the cross PI 161375 x Till) were germi- 
nated inside a Petri dish in a dark growth chamber for 3 
days at 25°C. After germination, the seeds were planted 
in pots that contained synthetic soil and maintained in a 
greenhouse at 26 ± 2°C and with day/night cycles of 16/ 
8 h, respectively. The PIT92 melon line was also used 
for construction of BAG libraries [12,13] and has been 
used for whole genome sequencing (Garcia-Mas et aL, 
unpublished). 

Isolation of mitochondrial DNA from intact mitochondria 

Intact mitochondrial organelles were isolated from 
young melon leaves according to a modification of a 
previously described method [61]. Fifty grams of young 
melon leaves were manually harvested, cut into 10- to 
20-mm lengths and ground in a Polytron PT2000 
homogeniser with 120 ml of grinding medium at 4°C. 
The homogenate was filtered through four layers of 
Miracloth, placed into 6 x 50 ml Nalgene tubes and 
centrifuged for 5 min at 3,200 rpm with a JA14 rotor in 
a Beckman Coulter centrifuge (Avanti J-26 XP). The 
supernatant was then re-centrifuged for 20 min at 
13,600 rpm, and the resulting pellet was resuspended in 
5 to 10 ml of Ix wash buffer, transferred to a 50 ml 



Nalgene tube and centrifuged for 5 min at 3,200 rpm 
with a JA17 rotor. After centrifugation, the supernatant 
was transferred to a new tube and re-centrifuged at 
13,600 rpm for 20 min. The resulting pellet was thor- 
oughly dispersed with a fine paintbrush in 5 ml of wash- 
ing buffer, layered over a 0 to 5% PVP gradient made 
earlier and centrifuged for 40 min at 21,000 rpm in a 
Beckman Coulter ultracentrifuge (Optima L-90 K). After 
centrifugation, the mitochondria formed a white-yellow 
colour band toward the bottom of the gradient, which 
was carefully recovered with a syringe, transferred to a 
new 50-ml Nalgene tube with Ix wash buffer and con- 
centrated in a pellet with 3 wash centrifugation steps at 
13,600 rpm for 15 min. After organelle isolation, mito- 
chondrial DNA was lysed and purified as described [62]. 

Mitochondrial genome sequencing and assembly 

Sequencing was performed using the Roche Genome 
Sequencer FLX System on 1/8 of a Titanium Microtitre 
Plate. The filtering process was passed by 120,802 
sequences, which contained 48,154,028 bases with an 
average length of 399 nt. Duplicate reads were identified 
using the cd-hit-454 program [63], and 104,462 nonre- 
dundant reads were assembled using Newbler (version 
2.5 beta) to produce a set of contigs totalling 2.711 Mb. 
The obtained contigs (except for the 64 contigs out of 
539 that had < lOx coverage) were used as a query for 
BLASTing [64] against additional pools of sequences 
(obtained from the same genotype, PIT92, used in this 
study) that were available in our laboratory: BES from 
two different BAC libraries [13] and whole genome 
sequences derived from Roche 454 sequencing of 3-kb, 
8-kb and 20-kb paired-end libraries (unpublished). Raw 
BESs were filtered and trimmed for quality and vector 
contamination using SeqTrim [65]. Only BESs that had 
> 98% identity to the query for over 80% of their length 
were considered. In cases in which the BESs were paired 
(when both 5' and 3' ends of the same BAC insert were 
available), both pairs were taken if only one pair met 
the described conditions. At the end, there were 1,822 
BESs (97.5% paired) used in this study. For the 454 
whole genome PEs, we created a database from a subset 
of nonredundant and "true" PEs (sequences that con- 
tained the 454 linker flanked on each side by > 50 nt of 
sequence). We retrieved sequences that had > 99% iden- 
tity to the query and ended up with 10,724 3-kb, 14,723 
8-kb and 2,683 20-kb PEs. 

Assemblies were performed using two different pro- 
grams: Newbler (version 2.5 beta) and MIRA (version 
2). Newbler is able to sort contigs into scaffolds using 
the PEs but is often unable to incorporate conserved 
repeats into these scaffolds, which leaves gaps of 
approximated sizes based on paired-end insert distances 
(repeats are often assembled into "collapsed" contigs 
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that remain orphaned after the assembly). In contrast, 
MIRA is unable to build scaffolds, but it tries to differ- 
entiate copies of conserved repeats and include them 
with the rest of non-repeat contigs. Therefore, contigs 
derived from MIRA were used, when possible, to close 
the gaps in the scaffolds obtained with Newbler or to 
join two or more scaffolds. A detailed summary with the 
metrics of the assembly process can be found in the 
Additional file 4 Table S3. 

Mitochondrial genome annotation and analysis 

A nucleotide database was built that contained the pre- 
dicted cDNAs from all the sequenced Streptophyta 
mitochondrial genomes, as previously published [25]. 
BLASTN searches were performed, and each individual 
ORF found was checked by comparison with all other 
mitochondrial proteins published; several changes were 
then introduced to produce the final annotated 
sequence. Structural RNA genes were identified using 
tRNAscan-SE 1.21 (for tRNAs) and RNAmmer 1.2 (for 
rRNAs) software [66,67]. 

The predicted mitochondrially encoded proteins were 
analysed for homology with other known proteins using 
the GenBank non-redundant protein database and the 
BLASTP software. Microsatellites were searched using the 
msatcommander 0.8.2 software. SSRs considered for the 
final dataset included 1- to 2-nt repeats of at least a 10 nt 
length and 3- to 6-nt repeats with at least four unit repeti- 
tions. Transposable-related sequences were identified 
using CENSOR online (with default sensitivity parameters 
2ind Arabidopsis thaliana as a reference DNA source) [68]. 
Tandem repeats were analysed using the Tandem Repeats 
Finder software [69] (min. align, score 60; max. period size 
2,000). Inverted repeats were detected using the Inverted 
Repeats Finder software [70] (match 2; mismatch 3; delta 
5; match probability 80; indel probability 10; Minscore 40; 
Maxlength to report 500,000; MaxLoop 500,000). Two dif- 
ferent programs were used to look for duplicated DNAs 
and to repeat family classification in the sequences of 
interest: REPuter and RepeatScout, respectively, with 
default parameters [71,72]. Results from REPuter were 
analysed to avoid overestimating the total repeat content 
due to repeat overlaps. 

Nuclear-like mitochondrial regions were identified by 
performing a BLASTN with e-value < lE-100 (corre- 
sponding approximately to a hit of 200 nt and 90% iden- 
tity or a 400 nt hit with 85% identity) against a melon 
nuclear genome draft that has been produced in our 
laboratories [Garcia-Mas et al, manuscript in prepara- 
tion]. Chloroplast-like regions were identified by per- 
forming a BLASTN analysis with e-value < lE-40 
against the assembled melon chloroplast genome 
reported in this paper. 



Comparisons to the C. lanatus and C. pepo mitochon- 
drial genomes (GenBank Acc. Nos. GQ856147 and 
GQ856148) were performed using BLASTN with e- 
values < lE-3. 

Additional material 



Additional file 1: Table 51. Protein homologies between C. melo and 
other plant chloroplast genomes. 

Additional file 2: Table S2. Protein homologies between C. melo and 
other plant mitochondrial genomes. 

Additional file 3: Figure SI. Syntenic relationships between the 
mitochondrial genomes of Cucumis melo, Citrullus lanatus and Cucurbita 
pepo. Only the protein coding regions have been used for this analysis. 
Intronless genes are depicted as orange vertical lines. Individual colours 
are used for the exons of each gene with introns. 

Additional file 4: Table S3. Metrics of the Cucumis melo mitondrial 
genome assembly. 
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